87 research outputs found
Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce
Electronic commerce is revolutionizing the way we think about
data modeling, by making it possible to integrate the processes of
(costly) data acquisition and model induction. The opportunity for
improving modeling through costly data acquisition presents itself
for a diverse set of electronic commerce modeling tasks, from personalization
to customer lifetime value modeling; we illustrate with
the running example of choosing offers to display to web-site visitors,
which captures important aspects in a familiar setting. Considering
data acquisition costs explicitly can allow the building of
predictive models at significantly lower costs, and a modeler may
be able to improve performance via new sources of information that
previously were too expensive to consider. However, existing techniques
for integrating modeling and data acquisition cannot deal
with the rich environment that electronic commerce presents. We
discuss several possible data acquisition settings, the challenges involved
in the integration with modeling, and various research areas
that may supply parts of an ultimate solution. We also present and
demonstrate briefly a unified framework within which one can integrate
acquisitions of different types, with any cost structure and
any predictive modeling objectiveNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Ask the GRU: Multi-Task Learning for Deep Text Recommendations
In a variety of application domains the content to be recommended to users is
associated with text. This includes research papers, movies with associated
plot summaries, news articles, blog posts, etc. Recommendation approaches based
on latent factor models can be extended naturally to leverage text by employing
an explicit mapping from text to factors. This enables recommendations for new,
unseen content, and may generalize better, since the factors for all items are
produced by a compactly-parametrized model. Previous work has used topic models
or averages of word embeddings for this mapping. In this paper we present a
method leveraging deep recurrent neural networks to encode the text sequence
into a latent vector, specifically gated recurrent units (GRUs) trained
end-to-end on the collaborative filtering task. For the task of scientific
paper recommendation, this yields models with significantly higher accuracy. In
cold-start scenarios, we beat the previous state-of-the-art, all of which
ignore word order. Performance is further improved by multi-task learning,
where the text encoder network is trained for a combination of content
recommendation and item metadata prediction. This regularizes the collaborative
filtering model, ameliorating the problem of sparsity of the observed rating
matrix.Comment: 8 page
Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce
Electronic commerce is revolutionizing the way we think about
data modeling, by making it possible to integrate the processes of
(costly) data acquisition and model induction. The opportunity for
improving modeling through costly data acquisition presents itself
for a diverse set of electronic commerce modeling tasks, from personalization
to customer lifetime value modeling; we illustrate with
the running example of choosing offers to display to web-site visitors,
which captures important aspects in a familiar setting. Considering
data acquisition costs explicitly can allow the building of
predictive models at significantly lower costs, and a modeler may
be able to improve performance via new sources of information that
previously were too expensive to consider. However, existing techniques
for integrating modeling and data acquisition cannot deal
with the rich environment that electronic commerce presents. We
discuss several possible data acquisition settings, the challenges involved
in the integration with modeling, and various research areas
that may supply parts of an ultimate solution. We also present and
demonstrate briefly a unified framework within which one can integrate
acquisitions of different types, with any cost structure and
any predictive modeling objectiveNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Aspect-based Sentiment Analysis of Scientific Reviews
Scientific papers are complex and understanding the usefulness of these
papers requires prior knowledge. Peer reviews are comments on a paper provided
by designated experts on that field and hold a substantial amount of
information, not only for the editors and chairs to make the final decision,
but also to judge the potential impact of the paper. In this paper, we propose
to use aspect-based sentiment analysis of scientific reviews to be able to
extract useful information, which correlates well with the accept/reject
decision.
While working on a dataset of close to 8k reviews from ICLR, one of the top
conferences in the field of machine learning, we use an active learning
framework to build a training dataset for aspect prediction, which is further
used to obtain the aspects and sentiments for the entire dataset. We show that
the distribution of aspect-based sentiments obtained from a review is
significantly different for accepted and rejected papers. We use the aspect
sentiments from these reviews to make an intriguing observation, certain
aspects present in a paper and discussed in the review strongly determine the
final recommendation. As a second objective, we quantify the extent of
disagreement among the reviewers refereeing a paper. We also investigate the
extent of disagreement between the reviewers and the chair and find that the
inter-reviewer disagreement may have a link to the disagreement with the chair.
One of the most interesting observations from this study is that reviews, where
the reviewer score and the aspect sentiments extracted from the review text
written by the reviewer are consistent, are also more likely to be concurrent
with the chair's decision.Comment: Accepted in JCDL'2
Creating Diverse Ensemble Classifiers
Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its general-ization error. We present a new method for generating ensembles, DECORATE (Diverse En-semble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive ac-curacy than both the base classifier and Bagging. DECORATE also obtains higher accuracy than Boosting early in the learning curve when training data is limited. We propose to show that DECORATE can also be effectively used for (1) active learning, to reduce the number of training examples required to achieve high accuracy; (2) exploiting unlabeled data to improve accuracy in a semi-supervised learning setting; (3) combining active learning with semi-supervision for improved results; (4) obtaining better class membership probability estimates; (5) reducing the error of regressors; and (6) improving the accuracy of relational learners
Recommended from our members
Creating diverse ensemble classifiers to reduce supervision
textEnsemble methods like Bagging and Boosting which combine the decisions of multiple
hypotheses are some of the strongest existing machine learning methods. The diversity
of the members of an ensemble is known to be an important factor in determining its generalization
error. In this thesis, we present a new method for generating ensembles, DECORATE
(Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples),
that directly constructs diverse hypotheses using additional artificially-generated
training examples. The technique is a simple, general meta-learner that can use any strong
learner as a base classifier to build diverse committees. The diverse ensembles produced by
DECORATE are very effective for reducing the amount of supervision required for building
accurate models. The first task we demonstrate this on is classification given a fixed trainvii
ing set. Experimental results using decision-tree induction as a base learner demonstrate
that our approach consistently achieves higher predictive accuracy than the base classifier,
Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting
on small training sets, and achieves comparable performance on larger training sets. Additional
experiments demonstrate DECORATE’s resilience to imperfections in data, in the
form of missing features, classification noise, and feature noise.
DECORATE ensembles can also be used to reduce supervision through active learning,
in which the learner selects the most informative examples from a pool of unlabeled
examples, such that acquiring their labels will increase the accuracy of the classifier. Query
by Committee is one effective approach to active learning in which disagreement within
the ensemble of hypotheses is used to select examples for labeling. Query by Bagging and
Query by Boosting are two practical implementations of this approach that use Bagging and
Boosting respectively, to build the committees. For efficient active learning it is critical that
the committee be made up of consistent hypotheses that are very different from each other.
Since DECORATE explicitly builds such committees, it is well-suited for this task. We introduce
a new algorithm, ACTIVEDECORATE, which uses DECORATE committees to select
good training examples. Experimental results demonstrate that ACTIVEDECORATE typically
requires labeling fewer examples to achieve the same accuracy as Query by Bagging
and Query by Boosting. Apart from optimizing classification accuracy, in many applications,
producing good class probability estimates is also important, e.g., in fraud detection,
which has unequal misclassification costs. This thesis introduces a novel approach to active
learning based on ACTIVEDECORATE which uses Jensen-Shannon divergence (a similarity
measure for probability distributions) to improve the selection of training examples for
optimizing probability estimation. Comprehensive experimental results demonstrate the
benefits of our approach.
Unlike the active learning setting, in many learning problems the class labels for
all instances are known, but feature values may be missing and can be acquired at a cost.
For building accurate predictive models, acquiring complete information for all instances is
often quite expensive, while acquiring information for a random subset of instances may not
be optimal. We formalize the task of active feature-value acquisition, which tries to reduce
the cost of achieving a desired model accuracy by identifying instances for which obtaining
complete information is most informative. We present an approach, based on DECORATE,
in which instances are selected for acquisition based on the current model’s accuracy and
its confidence in the prediction. Experimental results demonstrate that our approach can
induce accurate models using substantially fewer feature-value acquisitions than random
sampling.Computer Science
- …